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_^ Abstract 

O ■ Cellular Automata are discrete dynamical systems that evolve following simple and local rules. Despite of its local 
simplicity, knowledge discovery in CA is a NP problem. This is the main motivation for using data mining techniques for CA 
study. The Principal Component Analysis (PCA) is a useful tool for data mining because it provides a compact and optimal 
^ ' description of data sets. Such feature have been explored to compute the best subspace which maximizes the projection of 
the I/O patterns of CA onto the principal axis. The stability of the principal components against the input patterns is the 
main result of this approach. In this paper we perform such analysis but in the presence of noise which randomly reverses 
the CA output values with probability p. As expected, the number of principal components increases when the pattern size 
is increased. However, it seems to remain stable when the pattern size is unchanged but the noise intensity gets larger We 
, describe our experiments and point out further works using KL transform theory and parameter sensitivity analysis. 
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1. Introduction 



' Data Mining is a shorter term that refers to extracting or mining knowledge {interesting patterns) from large amounts of 
H ' data |241. The techniques in this field varies from computational methods (Decision Trees, Data Structures, etc.) statistical 
■ - - and regression (Correlation, Linear and Logistic Regression, Cluster Analysis,etc.), neural networks, and dimension reduction 
(PCA and Singular Value Decomposition ) |24|. 

When applied for scientific data sets, such patterns lead to conjectures about system behavior and properties which must 
be analyzed through a theoretical framework in order to confirm its truth. That is the philosophy of this work. Following 
1^4 1, we apply PCA methods for Cellular Automata Analysis. 

Cellular automata are discrete dynamical systems IIT21 1131 originally proposed by Von Neumann 1311 (see also 1 34 1 for 
a brief story). They consist of a lattice of discrete identical sites, each site taking on a finite set of values L40> i2J. The 
values of the sites evolve in discrete time steps according to simple rules that update the value of each site in terms of the 
values of neighboring sites El 1431 . Cellular Automata is a rich field of investigation that includes computational aspects 
like Universality, languages/grammars and state transition diagrams P41' '401, statistical mechanics and probability (self- 
organization, Markov theory, fractals, etc.) II431 1201 1351 1391 12TI . algebraic methods (matrix algebra, polynomials over Finite 
Fields, etc.) 1291 143II32I I36II1 IIIIOI 171 among others {|2]|6]. They have been applied for pattern classification and recognition 
1301 1191 l20l . pattern generation ||5]|6][ipi, hardware architectures for massively parallel computation 1181 1221 . models for 
biological process ||T5][T6l|25l and physical systems simulation fT7ll28ll33ll42llT7i r8..9J. 



Despite of its local simplicity, knowledge discovery in CA is a NP problem II43I . This is the main motivation for using 
data mining techniques for CA study. For instance, in a set of binary one-dimensional cellular automata is considered. 
Each such CA is feed with a set of input patterns and the obtained output data base is analyzed through Principal Component 
Analysis. 

In this paper we follow such viewpoint but in the presence of noise which randomly reverses the CA output values with 
probability p. As expected, the number of principal components increases when the pattern size is increased. However, it 
seems to remain stable when the pattern size is unchanged but the noise intensity gets larger We describe our experiments 
and point out further works using KL transform theory and parameter sensitivity analysis. 

This paper is organized as follows. The next section presents the basic concepts of CAs and how computational intractable 
problems arise in this area. Next, in section |3] we discuss the Principal Component Analysis. The application of PCA for 
cellular automata analysis is reviewed in section |3 Finally, we present our results and final comments on sections |5] and |6l 
respectively. 

2. Cellular Automata 

A cellular automaton (CA) is a quadruplet A = {L; S; N; /) where i is a set of indices or sites, S is the finite set of 
site values or states, N : L ^ L'' is a one-to-many mapping defining the neighborhood of every site « as a collection of 
k sites, and f : S'^ S is the evolution function of A |43j||3J. The neighborhood of site i is defined as the set N{i) — 
{j; \j ~ M — [{^ ~ l)/2]} (N stands for the integer part of a:); one must notice that a given site may or not be included in 
its own neighborhood. Since the set of states is finite, {fj} will denote the set of possible rules of the CA taken among the 
p= (#S')(#s)'- rules. 

For a one-dimensional cellular automaton the lattice L is an array of sites, and the transition rule / updates a site value 
according to the values of a neighborhood of fc = 2r + 1 sites around it, that means: 

/ : ^ S, (1) 



t+l t t t t t t t \ /ON 

a, / (a,_^, ...,a,_i,a,,a,+i, ...,a,+^j , (2) 

a* e S*, j -r,...,i + r. (3) 

where t means the evolution time, also taking discrete values, and a* means the value of the site i at time t p43ll2( (see also 
113 81 for on-line examples). Therefore, given a configuration of site values at time t, it will be updated through the application 
of the transition rule to generate the new configuration at time t + l, and so on. In the case of ?- = 1 in expression (|3 and 
S = {0, 1} we have a special class of cellular automata which was extensively studied in the CA literature L34..10. .7. 43 J . 
Figure^shows the very known example of such a CA. The rule in this case is: 

= (a*-i +«*+i)^"od2, (4) 

that means, the remainder of the division by two. The figure pictures the evolution of an initial configuration in which there 
is only one site with the value 1. 

Once r = 1 in expression it is easy to check that this rule is defined by the function: 

111 110 101 100011 010 001 000 

10 110 10^^ 

* 2^ + 1 * 2^ + * 2^ + 1 * 2"^ + 1 * 2^ + * 2^ + 1 * 2^ + * 2° = 90 (6) 

By observing this example, we see that there are 2^ — 256 such rules and for each one it can be assigned a rule number 
following the indexation illustrated on expression|6l In |41 1, Wolfram proposes four basic classes of behavior for these rules 
(see also |3|): 

Class 1: Evolution leads to homogeneous state in which all the sites have the same value (Figure|2a); 
Class 2: Evolution leads to a set of stable and periodic structures that are separated and simple (Figure|2]b); 
Class 3: Evolution leads to a chaotic pattern (Figure|2]c); 
Class 3: Evolution leads to complex structures (Figure|2]d). 



Figure 1. Evolution of CA given by rule number 90. In this case, the initial configuratio is a fi- 
nite one-dimensional lattice which has only one site with the value 1 (pictured in black). Source: 
www.stephenwolfram.com/publications/articles/ca/ 




Figure 2. Some examples of Wolfram's classification for one-dimensional r = 1 CAs. Source: 
www.stephenwolfram.com/publications/articles/ca/ 



Other classifications based on Markovian processes and group properties can be also found in the literature 1211111 1. 

Despite of its local simplicity, knowledge discovery in CA is a NP problem. In fact, let us take a one-dimensional CA with 
a finite lattice L of size d. One may consider the question of whether a particular sequence of d site values can occur after T 
time steps in the evolution of the cellular automaton, starting from any initial state. Then, one may ask whether there exists 
any algorithm that can determine the answer in a time given by some polynomial in d and T. The question can certainly be 
answered by testing all sequences of possible initial site values, that is {^S)'^. But this procedure requires a time that grows 
exponentially with d. 

Nevertheless, if an initial sequence could be guessed, then it could be tested in a time polynomial in d and T. As 
a consequence, the problem is in the class NP which motivates the application of data mining techniques for knowledge 
discovery in CA. The next sections review PCA basic theory and its application for the analysis of the (traditional) set of 
rules composed by 256 ID cellular automata obtained when r — 1, S — {0,1}. 



3. Principal Component Analysis 

Principal Component Analysis (PCA), also called Karhunen-Loeve, or KL method, can be seen as a method for data 
compression or dimensionality reduction |4l (see 1271 . section 5.11 also). Thus, let us suppose that the data to be compressed 
consist of N tuples or data vectors, from a n-dimensional space. Then, PCA searches for k n-dimensional orthonormal vectors 
that can best be used to represent the data, where k < n. Figure|3a,b picture this idea using a bidimensional representation. 
If we suppose the the data points are distributed over the elipse, it follows that the coordinate system ((X,y), shown in 
Figure|3]b is more suitable for representing the data set in a sense that will be formally described next. 

Thus, let S = {ui, U2, u^} the data set represented on Figure|3l By now, let us suppose that the centroid of the data 
set is the center of the coordinate system, that means: 

1 ^ 

Ca/ = -^u, = 0. (7) 




Figure 3. (a)Original dataset. (b) Extraction of the principal component. 

To address the issue of compression, we need a vector basis that satisfies a proper optimization criterium (rotated axes 
in Figure|3]b). Following |27|, consider the operations in Figure|3 The vector Uj is first transformed to a vector Vj by the 
matrix (transformation) A. Thus, we truncate by choosing the first m elements of Vj . The obtained vector Wj is just the 
transformation of Vj by Im, that is a matrix with Is along the first m diagonal elements and zeros elsewhere. Finally, Wj is 
transformed to zj by the matrix B. Let the square error defined as follows: 



1 ^ 

-y 



Uo — Zo 



= -Tr 
n 



N 



(8) 



where Tr means the trace of the matrix between the square brackets and the notation {*T) means the transpose of the complex 
conjugate of a matrix. Following Figure^] we observe that Zj = BImAuj. Thus we can rewrite (|8} as: 



A 




4 


^/ 


B 




nXn 




l<m<n 




nXn 





Figure 4. KL Transform formulation. Reprinted from f271. 



N 



^ {Uj - BImAUj) {VLJ - BIrnAvLj)* 



.1=0 



which yields: 



where: 



N 



-Tr 



{I-BI^A)RiI-BIraA)*^ 



N 



(9) 



(10) 



(11) 



i=0 



Following the literature, we call R the covariance matrix. We can now stating the optimization problem by saying that we 
want to find out the matrices A, B that minimizes J,„. The next theorem gives the solution for this problem. 
Theorem 1: The error Jm in expression is minimum when 



A = $ 



*T 



S = AB ^ BA^ L 



(12) 



where $ is the matrix obtained by the orthonormalized eigenvectors of R arranged according to the decreasing order of its 
eigenvalues. 

Proof. To minimize J™ we first observe that J™ must be zero if rn = n. Thus, the only possibility would be 



Besides, by remembering that 



we can also write: 



I = BA^ A^ B-^. 



Tr (CD) = Tr (DC) , 



-Tr 



(I-BIrnA)*^ {I-BIraA)R 



(13) 



(14) 



(15) 



Again, this expression must be null if m = n. Thus: 



Jn = -Tr [{I - BA- A*^B*^ + A*^B*^BA) R] 



This error is minimum if: 



B*^B = L A*'' A = I 



*T 



(16) 



that is, if A and B are unitary matrix. The next condition comes from the differentiation of J„i respect to the elements of A. 
We should set the result to zero in order to obtain the necessary condition to minimize J,„. This yields: 



I^A*^ (/ - A*^I,nA) R = 0, 



(17) 



which renders: 



Jm = -Tr [{I - A*'^ImA) R] 
n 

By using the property MAI , the last expression can be rewritten as 



(18) 



J,„ = -Tr [R - ImARA*^] 



Since R is fixed, J,„ will be minimized if 



J,„ = Tr [TnARA*'^] ^ 
is maximized where aj is the ith row of A. Once A is unitary, we must impose the constrain: 



(19) 



T * 1 

a, =1. 



(20) 



Thus, we shall maximize J,„ subjected to the last condition. The Lagrangian has the form: 



Jm = Y ajRa* + A, (l - aja*) , 

1=0 i=0 

where the A, are the Lagrangian multipliers. By differentiating this expression respect to we get: 

Ra* = Xia*, 

Thus, a* are orthonormaUzed eigenvectors of R. Substituting this result in expression J19l l produces: 



(21) 



(22) 



which is maximized if {a*, i = 0, 1, m — 1} correspond to the largest eigenvalues of R. (□) 

A straightforward variation of the above statement is obtained if we have a random vector u with zero mean. In this case, 
the pipehne of Figure|4]yields a random vector z and the square error can be expressed as: 



J 17 



-Tr 



e{{u- BIrnAu) (u - BI^Au)*^} 



which can be written as: 



Jm = -Tr 
n 



iI~BI„,A)R{I-BIrnAy 



(23) 



where R = E (uu*-'^) is the covariance matrix. Besides, if Cm in expression (0 is not zero, we must translate the coordinate 
system to Cm before computing the matrix R , that is: 



(24) 



In this case, matrix R will be given by: 



N 



Also, sometimes may be useful to consider in expression (|8} some other norm, not necessarily the 2-norm. In this case, 
there will be a real, symmetric and positive-defined matrix A/, that defines the norm. Thus, the square error Jm will be 
rewritten in more general form: 



1 ^ T 



(25) 



3=0 



]=0 



Obviously, if Af = / we recover expression (|8|l. The link between this case and the above one is easily obtained by 
observing that there is non-singular and real matrix W, such that: 



The matrix W defines the transformation: 



W^MW = I. 



Thus, by inserting these expressions in equation i25i we obtain: 



1 ^ ^ *T ^ 



Expression ( I28l l can be written as: 



1 ^ 



II > 



now using the 2-norm, like in expression (|8}. Therefore: 



Jrn = -Tr 



N 



3=0 



Following the same development performed above, we will find that we must solve the equation: 

Ra* = Aia*, 



where: 



N 



Thus, from transformations J27> it follows that: 

R = WRW'^. 

and, therefore, we must solve the following eigenvalue/eigenvector problem: 

{WRW^)a* = X,a*. 
The eigenvectors, in the original coordinate system, are finally given by: 

Wa* = a*. 

The next section shows the application of PCA method for knowledge discovery in CAs. 



(26) 



(27) 



(28) 



(29) 



(30) 



(31) 



(32) 



(33) 



(34) 



(35) 



4. PCA and Cellular Automata 

In this section we review the work presented in flA]. In this reference, authors analyzed one-dimensional CAs using 
PCA. The key idea is to consider binary patterns of a pre-defined size I as inputs of the CAs. It is considered the 256 one- 
dimensional CA rules obtained for r — 1 and S — {0, 1} in expression ! 1121 The output can be collected in a Table, like Table 
□ built for I = 5. 



Patterns 


Rq 


Ri . 


^254 


R255 


00000 


000 


Ill . 


. 000 


111 


00001 


000 


110 . 


. 001 


111 


11110 


000 


000 . 


. Ill 


111 


11111 


000 


000 . 


. Ill 


111 



Table 1. Table which rows are indexed by binary patterns and collumns by the CA rules Ro, Ri, 

R255- 



Each row j of Table[T]is obtained through the application of the rule Rj (see expression|6lfor an example of rule indexation) 
Then, I/O patterns are converted to cardinal numbers denoted by fj (nii), which means the cardinal number corresponding to 
the application of the rule j to the pattern i (i = 0, 1, 31 for Table[0. Thus, in general we get the matrix: 



F 



fi 



hp 



(36) 



fnl • • • fnp 

where fij = fj {rrii) . The matrix F is the data set to be analyzed.. 

For mining knowledge in F through PCA we should firstly to perform the operation (translation) given by Thus, 
matrix F is converted to the following one: 



X = 



Xii 



with: 



(37) 



(38) 



1 " 



(39) 



The matrix X is of size np. In 1141 columns xi, . . . ,Xp of X are called variables while rows ei , . . . , e„ are called covariables. 
However, we must observe that space dimension is the number of rules (p) and the number of data vectors is the number of 
patters (n) . Thus, following section|5] we should apply the PCA over the data set given by matrix X'^ in order to find out 
the principal components of the covariables space. Besides, in 1141 the norm in the covariables space is defined by: 



with: 



(40) 




■E 



{xij Ej) 



(41) 



Following section|3j we must solve equation ( I34> to find the eigenvalues and then apply expression i35i to get the eigen- 
vectors in the desired representation. The Table|2]shows the largest eigenvalues of this matrix for the listed pattern sizes. 

The main result is that the eigenvalues from the seventh rank are dramatically smaller in magnitude (104 times) than the 
first seven ones. Such observation led authors of 1 14| towards the following conjecture: 

Conjecture: The rank of R is 7 and does not depend on the size I of patterns being considered. When I is increased the 
eigenvalues tend to characteristic values obtained for I = 12. 

This is the main result presented in L14J . Next, we show our results by applying the same analysis but introducing 
randomness in the CA behavior. 



1 


Ai 


A2 


A3 


A4 


As 


Ae 


Ay 


4 


52.6802 


48.2214 


36.8869 


36.8263 


36.3134 


24.4539 


18.6179 


5 


58.2575 


50.9776 


37.2301 


37.0399 


30.7382 


21.7355 


18.0214 


6 


59.5952 


51.6519 


37.3406 


37.1109 


29.3769 


21.0940 


17.8305 


7 


59.9260 


51.8197 


37.3696 


37.1296 


29.0383 


20.9358 


17.7811 


9 


60.0290 


51.8721 


37.3788 


37.1355 


28.9325 


20.8865 


17.7656 


12 


60.0358 


51.8755 


37.3794 


37.1359 


28.9256 


20.8833 


17.7645 



Table 2. Eigenvalues of the correlation matrix. 



5. PCA and Probabilistic Cellular Automata 

In this section we report some experimental results obtained in the presence of noise which randomly reverses the CA 
output values with probability p. In the first experiment, reported on Table |3] we set Z — 5 and take some values for the 
probability p and compute the PCA for the generated matrix. 

We observe that the number of principal components is 31 for all tests. When size patterns are increased to (I — 6, 7, 9, 12) 
we observe (see Tables |4] and |5] for / = 6 ) the same behavior but the number of principal components increases to 
63, 127, 254, 254, respectively. In these tests, for a fixed pattern size, the noise intensity {p value) did not seem to play a 
considerable effect in the number of principal components if p > 0.2. However if p = 0.0, we know from the conjecture of 
sectionl^that this number is 7 for all cases considered. What happens for 0.0 < p < 0.2? 

Such question must be considered in further works by the viewpoint of the KL transform, following the procedure of 
section|3lfor a random field u{i,j) in order to have a complete answer 

However, an interesting points is that these question resemble the problem of studying the influence of control parameters 
in continuous dynamical systems |23|. With such parameters we can control the influence of factors like temperature, 
viscosity, irradiation, etc. Those systems can be analyzed through stability analysis |26 23 1, bifurcation and catastrophe 
theory |26, 1 1 and perturbation |23|. In that context, there may be critical values for the parameters, in the sense that sudden 
changes happen near them. As an example, let us consider a simple dynamical system; 



dx 

Hi 
dy 

dt 



Xx + y 
X + Xy 



(42) 



where A is a real parameter According to the theory of ordinary differential equations II23I . the qualitative analysis of this 
system may be done through the analysis of the eigenvalues/eigenvectors of the matrix of the above system: 



A 



X 1 
1 A 



(43) 



The eigenvalues are given by: 



ai = A + 1, a2 = A - 1. (44) 

We observe that the value A = 1 is a critical one because, for A > 1 the origin (0, 0) is an attractor but for A < 1 we 
observe a saddle point. Thus, we have a jump, that is, a sudden change in the system behavior, for A = 1. 

Cellular automata are discrete dynamical systems for which the probability p could be seem as a parameter that ranges in 
[0, 1]. Would there be critical values in this case? If the answer is "yes" which property suddenly changes? Could it be the 
number of principal components? 

These questions and the mathematical theory necessary to perform such analysis are points that we shall consider in further 
works. 
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30.993277 


14.347278 


27.368784 


28.491445 


A2 


17.127385 


13.29037 


16.960658 


17.279648 


A3 


13.541144 


12.885112 


14.959948 


14.604408 


A4 


13.180114 


11.816202 


13.281494 


13.752977 


A5 


12.388441 


11.678309 


11.357819 


12.844465 


Ae 


10.902415 


11.37782 


10.857413 


10.829726 


A7 


10.293973 


11.020907 


9.959074 


9.7779532 


As 


9.90735 


10.419589 


9.5077103 


9.2701578 


Ag 


9.2822227 


10.270046 


9.3582306 


8.8721786 


Aio 


8.7295329 


9.8165154 


8.8984414 


8.8107682 


All 


8.529595 


9.2160415 


8.5837995 


8.2016704 


A12 


7.8843864 


8.9368721 


8.3232351 


8.0601548 


Al3 


7.6179283 


8.8871718 


8.1230224 


7.6299203 


Al4 


7.0733465 


8.4223863 


7.8908832 


7.5448918 


Al5 


6.8634222 


8.2353647 


7.401953 


7.3142038 


A16 


6.7723605 


7.9216589 


7.2366057 


7.066622 


Al7 


6.4657069 


7.4769038 


6.6688247 


6.635226 


A18 


6.3216585 


7.0701685 


6.190632 


6.1886655 


A19 


6.1603787 


6.7891207 


6.1204134 


6.0738189 


A20 


5.8044919 


6.6787658 


5.7734589 


5.8293511 


A21 


5.6959129 


6.2790179 


5.7684386 


5.5555388 


A22 


5.3941432 


6.0714589 


5.2569218 


5.472738 


A23 


4.9930157 


5.8968197 


4.9528161 


5.2651065 


A 24 


4.799646 


5.6406084 


4.7506673 


4.896581 


A25 


4.5991479 


5.5455414 


4.5796108 


4.7932072 


A26 


4.3609133 


5.2887732 


4.5370786 


4.7514249 


A27 


A 1 <00/^/1 1 
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A28 


3.9754383 


4.9468391 


4.0258767 


3.7535244 


A29 


3.7421531 


4.826271 


3.850906 


3.6713829 


A30 


3.3309643 


4.362729 


3.7734524 


3.4443371 


A31 


3.1165726 


3.4774291 


3.3800895 


3.0247033 


A32 


7.075E-15 


5.587E-15 


5.641E-15 


7.235E-15 



Table 3. Principal components for patterns with / = 5 and noise with probability distribution given by 

P- 

6. Final Comments 

In this paper we review the application of PCA for cellular automata analysis. We follow the work presented in I14I I but in 
the presence of noise which randomly reverses the CA output values with probability p. We observe that, for a fixed pattern 
size, the noise intensity {p value) did not seem to play a considerable effect in the number of principal components if p > 0.2. 
The observed (and expected) influence is that the number of principal components increases. For example, if / = 5 and 
p — 0.0, the main result of |14| says that this number is 7 while for p G {0.2, 0.6, 0.8} this number increases to 31. The 
obvious question is that what happens for 0.0 < p < 0.2? 

This and others questions about parameter sensitivity analysis for cellular automata must be answered in further works. 
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5.2349427 


4.519477 


4.6544779 


A19 


4.3368357 


5.1286 


4.4405929 


4.3767439 


A20 


4.289522 


5.0877569 


4.3612835 


4.255288 


A21 


4.0221739 


4.8696669 


4.1463912 


4.1147127 
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4.0053207 


4.7376638 


4.0277528 


3.9599816 


A23 


3.9195876 


4.6631356 


3.9294454 


3.8759049 


A24 


3.8132963 


4.5676061 


3.8562459 


3.7592389 
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3.6929406 


4.5192326 


3.7291963 


3.6175464 


AzD 




4.JS0004/ 


J.DJDDZOJ 


J. JoVo /Z/ 


A27 


3.5245448 


4.2816878 


3.4551909 


3.4851947 


A28 


3.4689985 


4.1503676 


3.4185097 


3.4233569 


A29 


3.3229986 


4.0343815 


3.2894165 


3.376657 


A30 


3.2214124 


3.905456 


3.2094177 


3.247659 


A31 


3.1659706 


3.7940953 


3.1942609 


3.2206595 



Table 4. The first 31 principal components for patterns with I = 6 and noise with probability distribution 
given by p. 
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Table 5. The last 33 principal components for patterns 
given by p. 



with I = 6 and noise with probability distribution 



